You can view the usage status of all AI EasyMaker resources in the dashboard.
Displays the number of resources in use per resource.
Create and manage Jupyter notebook with essential packages installed for machine learning development.
Create a Jupyter notebook.
Image: Select OS image to be installed on the notebook instance.
Notebook Information
Storage
/root/easymaker
directory path. Data on this storage is retained even when the notebook is restarted.nas://{NAS ID}:/{path}
.Additional Settings
[Caution] When using NHN Cloud NAS Only NHN Cloud NAS created on the same project as AI EasyMaker is available to use.
[Note] Time to create notebooks Notebooks can take several minutes to create. Creation of the initial resources (notebooks, training, experiments, endpoint) takes additional few minutes to configure the service environment.
A list of notebooks are displayed. Select a notebook in the list to check details and make changes to it.
Status: Status of the notebook is displayed. Please refer to the table below for the main status.
Status | Description |
---|---|
CREATE REQUESTED | Notebook creation is requested. |
CREATE IN PROGRESS | Notebook instance is in the process of creation. |
ACTIVE (HEALTHY) | Notebook application is in normal operation. |
ACTIVE (UNHEALTHY) | Notebook application is not operating properly. If this condition persists after restarting the notebook, please contact customer service center. |
STOP IN PROGRESS | Notebook stop in progress. |
STOPPED | Notebook stopped. |
START IN PROGRESS | Notebook start in progress |
DELETE IN PROGRESS | Notebook delete in progress. |
CREATE FAILED | Failed to crate notebook. If keep fails to create, please contact Customer service center. |
STOP FAILED | Failed to stop notebook. Please try to stop again. |
START FAILED | Failed to start notebook. Please try to start again. |
DELETE FAILED | Failed to delete notebook. Please try to delete again. |
Action > Open Jupyter Notebook: Click Open Jupyter Notebook button to open the notebook in a new browser window. The notebook is only accessible to users who are logged in to the console.
Tag: Tag for notebook is displayed. You can change the tag by clicking Change.
Monitoring: On the Monitoring tab of the detail screen that appears when you select the notebook, you can see a list of monitored instances and a chart of basic metrics.
AI EasyMaker notebook instance provides native Conda virtual environment with various libraries and kernels required for machine learning.
Default Conda virtual environment is initialized and driven when the laptop is stopped and started, but the virtual environment and external libraries that the user installs in any path are not automatically initialized and are not retained when the laptop is stopped and started.
To resolve this issue, you must create a virtual environment in directory path /root/easymaker/custom-conda-envs
and install an external library in the created virtual environment.
AI EasyMaker notebook instance allows the virtual environment created in the /root/easymaker/custom-conda-envs
directory path to initialize and drive when the notebook is stopped and started.
Please refer to the following guide to configure your virtual environment.
Go to /root/easymaker/custom-conda-envs
path.
cd /root/easymaker/custom-conda-envs
To create virtual environment called easymaker_env
in python 3.8 version, run the command conda create
as follows
conda create --prefix ./easymaker_env python=3.8
Created virtual environment can be checked with conda env list
command.
(base) root@nb-xxxxxx-0:~# conda env list
# conda environments:
#
/opt/intel/oneapi/intelpython/latest
/opt/intel/oneapi/intelpython/latest/envs/2022.2.1
base * /opt/miniconda3
easymaker_env /root/easymaker/custom-conda-envs/easymaker_env
Stop the running notebook or start the stopped notebook.
[Caution] How to retain your virtual environment and external libraries when starting the notebook after stopping it When stopping and starting the notebook, the virtual environment and external libraries that the user create can be initialized. In order to retain, configure your virtual environment by referring to User Virtual Execution Environment Configuration.
[Note] Time to start and stop notebooks It may take several minutes to start and stop notebooks.
Change the instance flavor of the created notebook. Instance flavor you want to change can only be changed to the same core type instance flavor as the existing instance.
[Note] Time to change instance flavors It may take several minutes to change the instance flavor.
Delete the created notebook.
[Note] Storage When deleting a notebook, boot storage and data storage are to be deleted. Connected NHN Cloud NAS is not deleted and must be deleted individually from NHN Cloud NAS.
Experiments are managed by grouping related trainings into experiments.
[Note] Experiment creation time Creating experiments can take several minutes. When creating the initial resources (laptops, trainings, labs, endpoints), it takes an extra few minutes to configure the service environment.
Experiments appears. Select an experiment to view and modify detailed information.
Status: Experiment status appears. Please refer to the table below for main status.
Status | Description |
---|---|
CREATE REQUESTED | Creating an experiment is requested. |
CREATE IN PROGRESS | An experiment is being created. |
CREATE FAILED | Failed to create an experiment. Please try again. |
ACTIVE | The experiment is successfully created. |
Operation
Delete an experiment.
[Note] Unable to delete experiment if associated training exists. Experiment cannot be deleted if there is a training associated with the experiment. Please delete the associated training first, then delete the experiment. For related training, you can check the list by clicking the [Training] tab in the detail screen at the bottom that is displayed when you click the experiment you want to delete.
Provides an training environment where you can learn and identify machine training algorithms based on training results.
Set the training environment by selecting the instance and OS image to be trained, and proceed with training by entering the algorithm information and input/output data path to learn.
Algorithm information : Enter information about the algorithm you want to learn.
Own Algorithm : Uses an algorithm written by the user.
algorithm path
entry point
Image : Choose an image for your instance that matches the environment in which you need to run your training.
Training Resource Information
Input Data
[Caution] When using NHN Cloud NAS, Only NHN Cloud NAS created in the same project as AI EasyMaker can be used.
[Caution] training failure when deleting training input data Training may fail if the input data is deleted before training is completed.
A list of studies is displayed. If you select a training from the list, you can check detailed information and change the information.
Status : Shows the status of training. Please refer to the table below for the main status.
Status | Description |
---|---|
CREATE REQUESTED | You have requested to create a training. |
CREATE IN PROGRESS | This is a state in which resources necessary for training are being created. |
RUNNING | Training is in progress. |
STOPPED | Training is stopped at the user's request. |
COMPLETE | Training has been completed normally. |
STOP IN PROGRESS | Training is stopping. |
FAIL TRAIN | This is a failed state during training. Detailed failure information can be checked through the Log & Crash Search log when log management is enabled. |
CREATE FAILED | The training creation failed. If creation continues to fail, please contact customer service. |
FAIL TRAIN IN PROGRESS, COMPLETE IN PROGRESS | The resources used for training are being cleaned up. |
Operation
Hyperparameters : You can check the hyperparameter values set for training on the hyperparameter tab of the detailed screen displayed when selecting training.
Monitoring: When you select the endpoint stage, you can see a list of monitored instances and basic metrics charts in the Monitoring tab of the detailed screen that appears.
Create a new training with the same settings as an existing training.
Create a model with training in the completed state.
Deletes a training.
[Note] Training cannot be deleted if a related model exists. Training cannot be deleted if a model created by the training to be deleted exists. Please delete the model first and then the training.
Hyperparameter tuning is the process of optimizing hyperparameter values to maximize a model's predictive accuracy. If you don't use this feature, you'll have to manually tune the hyperparameters to find the optimal values while running many training jobs yourself.
How to configure a hyperparameter tuning job.
[Caution] When using NHN Cloud NAS Only NHN Cloud NAS created in the same project as AI EasyMaker can be used.
[Caution] Training failure when deleting training input data Training may fail if the input data is deleted before training is completed.
A list of hyperparameter tunings is displayed. Select a hyperparameter tuning from the list to view details and change information.
Status : Shows the status of hyperparameter tuning. Please refer to the table below for the main status.
Status | Description |
---|---|
CREATE REQUESTED | Requested to create hyperparameter tuning. |
CREATE IN PROGRESS | Resources required for hyperparameter tuning are being created. |
RUNNING | Hyperparameter tuning is in progress. |
STOPPED | Hyperparameter tuning is stopped at the user's request. |
COMPLETE | Hyperparameter tuning has been successfully completed. |
STOP IN PROGRESS | Hyperparameter tuning is stopping. |
FAIL HYPERPARAMETER TUNING | A failed state during hyperparameter tuning in progress. Detailed failure information can be checked through the Log & Crash Search log when log management is enabled. |
CREATE FAILED | Hyperparameter tuning generation failed. If creation continues to fail, please contact customer service. |
FAIL HYPERPARAMETER TUNING IN PROGRESS, COMPLETE IN PROGRESS, STOP IN PROGRESS | Resources used for hyperparameter tuning are being cleaned up. |
Status Details: The bracketed content in the COMPLETE
status is the status details. See the table below for key details.
Details | Target Metric Value: Indicates the target metric value. |
---|---|
GoalReached | Details when training for hyperparameter tuning is complete by reaching the target value. |
MaxTrialsReached | Details when hyperparameter tuning has reached the maximum number of training runs and is complete. |
SuggestionEndReached | Details when the exploration algorithm in Hyperparameter Tuning has explored all hyperparameters. |
- Hyperparameter tuning generation failed. If creation continues to fail, please contact customer service. | |
- FAIL HYPERPARAMETER TUNING IN PROGRESS, COMPLETE IN PROGRESS, STOP IN PROGRESS |
|
Resources used for hyperparameter tuning are being cleaned up. |
Operation
Monitoring: When you select hyperparameter tuning, you can check the list of monitored instances and basic indicator charts in the Monitoring tab of the detailed screen that appears.
Displays a list of trainings auto-generated by hyperparameter tuning. Select a training from the list to check detailed information.
Status : Shows the status of the training automatically generated by hyperparameter tuning. Please refer to the table below for the main status.
Status | Description |
---|---|
CREATED | Training has been created. |
RUNNING | Training is in progress. |
SUCCEEDED | Training has been completed normally. |
KILLED | Training is stopped by the system. |
FAILED | This is a failed state during training. Detailed failure information can be checked through the Log & Crash Search log when log management is enabled. |
METRICS_UNAVAILABLE | This is a state where target metrics cannot be collected. |
EARLY_STOPPED | Performance (goal metric) is not getting better while training is in progress, so it is in an early-stopped state. |
Create a new hyperparameter tuning with the same settings as the existing hyperparameter tuning.
Create a model with the best training of hyperparameter tuning in the completed state.
Delete a hyperparameter tuning.
[Note] Hyperparameter tuning cannot be deleted if the associated model exists. Hyperparameter tuning cannot be deleted if the model created by the hyperparameter tuning you want to delete exists. Please delete the model first, then the hyperparameter tuning.
By creating a training template in advance, you can import the values entered into the template when creating training or hyperparameter tuning.
For information on what you can set in your training template, see Creating a training.
Displays a list of training templates. Select a training template from the list to view details and change information.
Create a new training template with the same settings as an existing training template.
Delete the training template.
Can manage models of AI EasyMaker's training outcomes or external models as artifacts.
obs://{Object Storage API endpoint}/{containerName}/{path}
.nas://{NAS ID}:/{path}
[Caution] When using NHN Cloud NAS Only NHN Cloud NAS created on the same project as AI EasyMaker is available to use.
[Caution] Retain model artifacts in storage If not retained the model artifacts stored in storage, the creation of endpoints for that model fails.
Model list is displayed. Selecting a model in the list allows to check detailed information and make changes to it.
Create an endpoint that can serve the selected model.
Delete a model.
[Note] Unable to delete model if associated endpoint exists You cannot delete model if endpoint created by model want to delete is existed. To delete, delete the endpoint created by the model first and then delete the model.
Create and manage endpoints that can serve the model.
/inference
, you can request inference API at POST https://{point-domain}/inference
.[Note] Time to create endpoints Endpoint creation can take several minutes. Creation of the initial resources (notebooks, training, experiments, endpoints) takes additional few minutes to configure the service environment.
[Note] Restrictions on API Gateway service resource provision when creating endpoints When you create a new endpoint, create a new API Gateway service. Adding new stage on existing endpoint creates new stage in API Gateway service. If you exceed the resource provision policy in API Gateway Service Resource Provision Policy, you might not be able to create endpoints in AI EasyMaker. In this case, adjust API Gateway service resource quota.
Endpoints list is displayed. Select an endpoint in the list to check details and make changes to the information.
Status: Status of endpoint. Please refer to the table below for main status.
Status | Description |
---|---|
CREATE REQUESTED | Endpoint creation is requested. |
CREATE IN PROGRESS | Endpoint creation is in progress. |
UPDATE IN PROGRESS | Some of endpoint stages have tasks in progress. You can check the status of task for each stage in the endpoint stage list. |
DELETE IN PROGRESS | Endpoint deletion is in progress. |
ACTIVE | Endpoint is in normal operation. |
CREATE FAILED | Endpoint creation has failed. You must delete and recreate the endpoint. If the creation fails repeatedly, please contact the Customer Center. |
UPDATE FAILED | Some of endpoint stages are not serviced properly. You must delete and recreate the stages with issues. |
API Gateway Status: Displays API Gateway status information for default stage of endpoint. Please refer to the table below for main status.
Status | Description |
---|---|
CREATE IN PROGRESS | API Gateway Resource creation in progress. |
STAGE DEPLOYING | API Gateway default stage deploying in progress. |
ACTIVE | API Gateway default stage is successfully deployed and activated. |
NOT FOUND: STAGE | Default stage for endpoint is not found. Please check if the stage exists in API Gateway console. If stage is deleted, the deleted API Gateway stage cannot be recovered, and the endpoint have to be deleted and recreated. |
NOT FOUND: STAGE DEPLOY RESULT | The deployment status of the endpoint default stage is not found. Please check if the default stage is deployed in API Gateway console. |
STAGE DEPLOY FAIL | API Gateway default stage has failed to deploy. [Note] Please refer to Recovery method when the stage's API Gateway in 'Deployment Failure' status and recover from the deployment failed state. |
Add new stage to existing endpoint. You can create and test the new stage without affecting default stage.
Stage list created under endpoint is displayed. Select stage in the list to check more information in the list.
Status: Displays status of endpoint stage. Please refer to the table below for main status.
Status | Description |
---|---|
CREATE REQUESTED | Endpoint stage creation requested. |
CREATE IN PROGRESS | Endpoint stage creation is in progress. |
DEPLOY IN PROGRESS | Model deployment to the endpoint stage is in progress. |
DELETE IN PROGRESS | Endpoint stage deletion is in progress. |
ACTIVE | Endpoint stage is normal operation. |
CREATE FAILED | Endpoint stage creation has failed. Please try again. |
DEPLOY FAILED | Deployment to the endpoint stage has failed. Please try again. |
API Gateway Status: Displays stage status of API Gateway from where endpoint stage is deployed.
[Caution] Precautions when changing settings for API Gateway created by AI EasyMaker When creating an endpoint or an endpoint stage, AI EasyMaker creates API Gateway services and stages for the endpoint. Please note the following precautions when changing API Gateway services and stages created by AI EasyMaker directly from API Gateway service console. 1. Avoid deleting API Gateway services and stages created by AI EasyMaker. Deletion may prevent the endpoint from displaying API Gateway information correctly, and changes made to endpoint may not be applied to API Gateway. 2. Avoid changing or deleting resources in API Gateway resource path that was entered when creating endpoints. Deletion may cause the endpoint's inference API call to fail 3. Avoid adding resources in API Gateway resource path that was entered when creating endpoints. The added resources may be deleted when adding or changing endpoint stages. 4. In the stage settings of API Gateway, do not disable Backend Endpoint Url Redifinition or change the URL set in API Gateway resource path. If you change the url, endpoint's inference API call might fail. Other than above precautions, other settings are available with features provided by API Gateway as necessary. For more information about how to use API Gateway, refer to API Gateway Console Guide.
[Note] Recovery method when the stage's API Gateway is in 'Deployment Failed' status If stage settings of AI EasyMaker endpoint are not deployed to the API Gateway stage due to a temporary issue, deployment status is displayed as failed. In this case, you can deploy API Gateway stage manually by clicking Select Stage from the Stage list > View API Gateway Settings > 'Deploy Stage' in the bottom detail screen. If this guide couldn’t recover the deployment status, please contact the Customer Center.
Add a new resource to an existing endpoint stage.
/inference
, you can request the inference API with POST https://{enpdoint-domain}/inference
.A list of resources created under the endpoint stage is displayed.
Status : Shows the status of stage resource. Please refer to the table below for the main status.
Status | Description |
---|---|
CREATE REQUESTED | Creating stage resource requested. |
CREATE IN PROGRESS | Stage resource is being created. |
Training is properly completed. | Stage resource is being deleted. |
ACTIVE | Stage resource is deployed normally. |
CREATE FAILED | Creating stage resource failed. Please try again. |
Model Name: The name of the model deployed to the stage.
// Inference API example: Request
curl --location --request POST '{API Gateway Resource Path}' \
--header 'Content-Type: application/json' \
--data-raw '{
"instances": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}'
// Inference API Example: Response
{
"predictions" : [
[
0.337502569,
0.332836747,
0.329660654
],
[
0.337530434,
0.332806051,
0.329663515
]
]
}
Change the default stage of the endpoint to another stage. To change the model of an endpoint without service stop, AI EasyMaker recommends deploying the model using stage capabilities.
[Caution] Delete stage of API Gateway service when deleting the endpoint stage Deleting an endpoint stage in AI EasyMaker also deletes the stage in API Gateway service from which the endpoint's stage is deployed. If there is an API running on the API Gateway stage to be deleted, please be noted that API calls cannot be made.
Delete an endpoint.
[Caution] Delete API Gateway service when deleting the endpoint stage Deleting an endpoint stage in AI EasyMaker also deletes API Gateway service from which the endpoint's stage was deployed. If there is API running on the API Gateway service to be deleted, please be noted that API calls cannot be made.
Provides an environment to make batch inferences from an AI EasyMaker model and view inference results in statistics.
Set up the environment in which batch inference will be performed by selecting an instance and OS image, and enter the paths to the input/output data to be inferred to proceed with batch inference.
[Caution] When using NHN Cloud NAS Only NHN Cloud NAS created on the same project as AI EasyMaker is available to use.
[Caution] Batch inference fails when batch inference input data is deleted Batch inference can fail if you delete input data before batch inference is complete.
[Caution] When setting input data detailed options If the Glob pattern is not entered properly, batch inference may not work properly because the input data cannot be found. When used together with the Include Glob pattern, the Exclude Glob pattern takes precedence.
[Caution] When setting batch options You must set the batch size and inference timeout appropriately based on the performance of the model you are batch inferring. If the settings you enter are incorrect, batch inference might not perform well enough.
Displays a list of batch inferences. Select a batch inference from the list to check the details and change the information.
Status : Displays the status of batch inference. Please refer to the table below for the main status.
Failed Training : Indicates the number of failed lessons. | Best Training: Indicates the target metric information of the training that recorded the highest target metric value among the training automatically generated by hyperparameter tuning. |
---|---|
Status : Shows the status of hyperparameter tuning. Please refer to the table below for the main status. | You have requested to create a batch inference. |
API Gateway Status: Displays API Gateway status information for default stage of endpoint. Please refer to the table below for main status. | This is a state in which resources necessary for batch inference are being created. |
Description | Batch inference is in progress. |
Resources required for hyperparameter tuning are being created. | Batch inference is stopped at the user's request. |
COMPLETE | Batch inference has been completed successfully. |
STOP IN PROGRESS | Batch inference is stopping. |
FAIL BATCH INFERENCE | This is a failed state during batch inference. Detailed failure information can be checked through the Log & Crash Search log when log management is enabled. |
Stage resource is being deleted. | The batch inference creation failed. If creation continues to fail, please contact customer service. |
FAIL BATCH INFERENCE IN PROGRESS, COMPLETE IN PROGRESS | The resources used for batch inference are being cleaned up. |
* Operation | |
* Stop: You can stop batch inference in progress. | |
* Monitoring: When you select a batch inference, you can check the list of monitored instances and basic indicator charts in the Monitoring tab of the detailed screen that appears. | |
* The Monitoring tab is disabled while batch inference is being created. |
Create a new batch inference with the same settings as an existing batch inference.
Delete a batch inference.
User-personalized container images can be used to drive notebooks, training, and hyperparameter tuning. Only private images derived from the notebook/deep learning images provided by AI EasyMaker can be used when creating resources in AI EasyMaker. See the table below for the base images in AI EasyMaker.
Image Name | CoreType | Framework | Framework version | Python version | Image address |
---|---|---|---|---|---|
Ubuntu 22.04 CPU Python Notebook | CPU | Python | 3.10.12 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/python-notebook:3.10.12-cpu-py310-ubuntu2204 |
Ubuntu 22.04 GPU Python Notebook | GPU | Python | 3.10.12 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/python-notebook:3.10.12-gpu-py310-ubuntu2204 |
Ubuntu 22.04 CPU PyTorch Notebook | CPU | PyTorch | 2.0.1 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/pytorch-notebook:2.0.1-cpu-py310-ubuntu2204 |
Ubuntu 22.04 GPU PyTorch Notebook | GPU | PyTorch | 2.0.1 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/pytorch-notebook:2.0.1-gpu-py310-ubuntu2204 |
Ubuntu 22.04 CPU TensorFlow Notebook | CPU | TensorFlow | 2.12.0 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/tensorflow-notebook:2.12.0-cpu-py310-ubuntu2204 |
Ubuntu 22.04 GPU TensorFlow Notebook | GPU | TensorFlow | 2.12.0 | 3.10 | fb34a0a4-en1-registry.container.nhncloud.com/easymaker/tensorflow-notebook:2.12.0-gpu-py310-ubuntu2204 |
Image Name | CoreType | Framework | Framework version | Python version | Image address |
---|---|---|---|---|---|
Ubuntu 22.04 CPU PyTorch Training | CPU | PyTorch | 2.0.1 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/pytorch-train:2.0.1-cpu-py310-ubuntu2204 |
Ubuntu 22.04 GPU PyTorch Training | GPU | PyTorch | 2.0.1 | 3.10 | fb34a0a4-en1-registry.container.nhncloud.com/easymaker/pytorch-train:2.0.1-gpu-py310-ubuntu2204 |
Ubuntu 22.04 CPU TensorFlow Training | CPU | TensorFlow | 2.12.0 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/tensorflow-train:2.12.0-cpu-py310-ubuntu2204 |
Ubuntu 22.04 GPU TensorFlow Training | GPU | TensorFlow | 2.12.0 | 3.10 | fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/tensorflow-train:2.12.0-gpu-py310-ubuntu2204 |
[Note] Limitations on using private images * Only private images derived from base images provided by AI EasyMaker can be used. * Only NHN Container Registry (NCR) can be integrated as a container registry service where private images are stored. (As of December 2023)
The following document explains how to create a container image with an AI EasyMaker-based image using Docker, and using a private image for notebooks in AI EasyMaker.
Create a DockerFile of private image.
FROM fb34a0a4-kr1-registry.container.nhncloud.com/easymaker/python-notebook:3.10.12-cpu-py310-ubuntu2204 as easymaker-notebook
RUN conda create -n example python=3.10
RUN conda activate example
RUN pip install torch torchvision
Build a private image and push to the container registry Build an image with a Dockerfile and save (push) the image to the NCR registry.
docker build -t {image name}:{tags} . .
docker tag {image name}:{tag} docker push {NCR registry address}/{image name}:{tag}
docker push {NCR registry address}/{image name}:{tag} .
(Example)
docker build -t custom-training:v1 .
docker tag custom-training:v1 example-kr1-registry.container.nhncloud.com/registry/custom-training:v1
docker push example-kr1-registry.container.nhncloud.com/registry/custom-training:v1
Create a private image in AI EasyMaker of the image you saved (pushed) to the NCR.
Create a notebook with the private image you created.
[Note] You can create resources using your private image in the same way for non-notebook training and hyperparameter tuning.
[Note] Container registry service: NHN Container Registry (NCR) Only NCR service can be used as a container registry service. (As of December 2023)
Enter the following values for the account ID and password for the NCR service.
ID: User Access Key of NHN Cloud user account
Password: User Secret Key of NHN Cloud user account
In order for AI EasyMaker to pull an image from a user's registry where private images are stored to power the container, they need to be logged into the user's registry. If you save your login information with a registry account, you can reuse it in images linked to that registry account. To manage your registry accounts, go to the Image menu in the AI EasyMaker console, then select the Registry Account tab.
Create a new registry account.
[Note] When you change your registry account, you sign in to the registry service with the changed username and password when using images associated with that account. If you enter an incorrect registry username and password, the login during a private image pull fails and the resource creation fails. If there are resources being created with a private image that has a registry account associated with it, or if there are studies and hyperparameters in progress, you cannot modify them.
Select the registry account you want to delete from the list, and click Delete Registry Account.
[Note] You cannot delete a registry account associated with an image. To delete, delete the associated image first and then delete the registry account.
Some features of AI EasyMaker use the user's NHN Cloud Object Storage as input/output storage You must allow read or write access to user’s AI EasyMaker system account in NHN Cloud Object Storage container for running normal features.
Allowing read/write permissions on the AI EasyMaker system account to the user's NHN Cloud Object Storage container is meaning that AI EasyMaker system account can read or write files in accordance with permissions granted to all files in the user's NHN Cloud Object Storage container.
You have to check this information to set up an access policy in User Object Storage only with the required accounts and permissions.
The 'User' take responsibility for all consequences of allowing the user to access Object Storage for an account other than the AI EasyMaker system account during the access policy setting process, and AI EasyMaker is not responsible for it.
[Note] According to features, AI EasyMaker accesses, reads or writes to Object Storage as follows.
Feature | Access Right | Access target |
---|---|---|
Training | Read | Algorithm path entered by user, training input data path |
Training | Write | User-entered training output data, checkpoint path |
Model | Read | Model artifact path entered by user |
Endpoint | Read | Model artifact path entered by user |
To add read/write permissions to AI EasyMaker system account in Object Storage, refer to the following:
Logs and events generated by the AI EasyMaker service can be stored in the NHN Cloud Log & Crash Search service. To store logs in the Log & Crash Search service, you have to enable Log & Crash service and separate usage fee will be charged.
AI EasyMaker service sends logs to Log & Crash Search service in the following defined fields:
Common Log Field
Name | Description | Valid range |
---|---|---|
easymakerAppKey | AI EasyMaker Appkey(AppKey) | - |
category | Log category | easymaker.training, easymaker.inference |
logLevel | Log level | INFO, WARNING, ERROR |
body | Log contents | - |
logType | Service name provided by log | NHNCloud-AIEasyMaker |
time | Log Occurrence Time (UTC Time) | - |
Training Log Field
Name | Description |
---|---|
trainingId | AI EasyMaker training ID |
Hyperparameter Tuning Log Field
Name | Description |
---|---|
hyperparameterTuningId | AI EasyMaker hyperparameter tuning ID |
Endpoint Log Field
Name | Description |
---|---|
endpointId | AI EasyMaker Endpoint ID |
endpointStageId | Endpoint stage ID |
inferenceId | Inference request own ID |
action | Action classification (Endpoint.Model) |
modelName | Model name to be inferred |
Batch Inference Log Field
Name | Description |
---|---|
batchInferenceId | AI EasyMaker batch inference ID |
As shown in the example below, you can use hyperparameter values entered during training creation.
import argparse
model_version = os.environ.get("EM_HP_MODEL_VERSION")
def parse_hyperparameters():
parser = argparse.ArgumentParser()
# Parsing the entered hyper parameter
parser.add_argument("--epochs", type=int, default=500)
parser.add_argument("--batch_size", type=int, default=32)
...
return parser.parse_known_args()
Key Environment Variables
Environment variable name | Description |
---|---|
EM_SOURCE_DIR | Absolute path to the folder where the algorithm script entered at the time of training creation is downloaded |
EM_ENTRY_POINT | Algorithm entry point name entered at training creation |
EM_DATASET_${Data set name} | Absolute path to the folder where each data set entered at the time of training creation is downloaded |
EM_DATASETS | Full data set list ( json format) |
EM_MODEL_DIR | Model storage path |
EM_CHECKPOINT_INPUT_DIR | Input checkout storage path |
EM_CHECKPOINT_DIR | Checkpoint Storage Path |
EM_HP_${ Upper case converted Hyperparameter key } | Hyperparameter value corresponding to the hyperparameter key |
EM_HPS | Full Hyperparameter List (in json format) |
EM_TENSORBOARD_LOG_DIR | TensorBoard log path for checking training results |
EM_REGION | Current Region Information |
EM_APPKEY | Appkey of AI EasyMaker service currently in use |
Example code for utilizing environment variables
import os
import tensorflow
dataset_dir = os.environ.get("EM_DATASET_TRAIN")
train_data = read_data(dataset_dir, "train.csv")
model = ... # Implement the model using input data
model.load_weights(os.environ.get('EM_CHECKPOINT_INPUT_DIR', None))
callbacks = [
tensorflow.keras.callbacks.ModelCheckpoint(filepath=f'{os.environ.get("EM_CHECKPOINT_DIR")}/cp-{{epoch:04d}}.ckpt', save_freq='epoch', period=50),
tensorflow.keras.callbacks.TensorBoard(log_dir=f'{os.environ.get("EM_TENSORBOARD_LOG_DIR")}'),
]
model.fit(..., callbacks)
model_dir = os.environ.get("EM_MODEL_DIR")
model.save(model_dir)
In order to check result indicators on the TensorBoard screen after training, the TensorBoard log storage space must be set to the specified location (EM_TENSORBOARD_LOG_DIR
) when writing the training script.
Example code for Tesnsorboard log storage (TensorFlow)
import tensorflow as tf
# Specify the TensorBoard log path
tb_log = tf.keras.callbacks.TensorBoard(log_dir=os.environ.get("EM_TENSORBOARD_LOG_DIR"))
model = ... # model implementation
model.fit(x_train, y_train, validation_data=(x_test, y_test),
epochs=100, batch_size=20, callbacks=[tb_log])
TF_CONFIG
required for distributed training is automatically set. For more information, please refer to the Tensorflow guide document.Backends
settings are required for distributed training. If distributed training is performed on CPU, set it to gloo, and if distributed training is performed on GPU, set it to nccl. For more information, please refer to the Pytorch guide document.The AI EasyMaker service periodically upgrades the cluster version to provide stable service and new features. When a new cluster version is deployed, you need to move the notebooks and endpoints that are running on the old version of the cluster to the new cluster. Explains how to move new clusters by resource.
On the Notebook list screen, notebooks that need to be moved to the new cluster display a Restart button to the left of their name. Hovering the mouse pointer over theRestart button displays restart instructions and an expiration date.
Restarts take about 25 minutes for the first run, and about 10 minutes for subsequent runs. Failed restarts are automatically reported to the administrator.
On the endpoints list screen, endpoints that need to be moved to the new cluster will have a ! Notice to the left of the name. If you hover over the ! Notice, it displays a version upgrade announcement and an expiration date. Before the expiration, you must follow these instructions to move stages running on the old version cluster to the new version cluster.
[Caution] Deleting a stage will shut down the endpoint, preventing API calls. Ensure that the stage is not in service before deleting it.
The default stage is the stage on which the actual service operates. To move the cluster version of the default stage without disrupting the service, use the following guide to move it.
exit code : -9 (pid: {pid})